Skip to content

Conversation

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Oct 9, 2024

Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time.

  • Characters not encodable in the current locale are now acceptable in the format string.
  • Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated.
  • Embedded null character no longer terminates the format string.

Closes #52551, #78662 and #124531.

Fix time.strftime(), the strftime() method and formatting of the
datetime classes datetime, date and time.

* Characters not encodable in the current locale are now acceptable in
  the format string.
* Surrogate pairs and sequence of surrogatescape-encoded bytes are no
  longer recombinated.
* Embedded null character no longer terminates the format string.

This fixes also pythongh-78662 and pythongh-124531.
@serhiy-storchaka serhiy-storchaka added needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes labels Oct 9, 2024
@serhiy-storchaka serhiy-storchaka changed the title gh-52551: Fix encoding issues in strftime() gh-52551, gh-78662, gh-124531: Fix encoding issues in strftime() Oct 9, 2024
@serhiy-storchaka serhiy-storchaka changed the title gh-52551, gh-78662, gh-124531: Fix encoding issues in strftime() gh-124531: Fix encoding issues in strftime() Oct 9, 2024
@serhiy-storchaka serhiy-storchaka changed the title gh-124531: Fix encoding issues in strftime() gh-78662: Fix encoding issues in strftime() Oct 9, 2024
@serhiy-storchaka serhiy-storchaka changed the title gh-78662: Fix encoding issues in strftime() gh-52551: Fix encoding issues in strftime() Oct 9, 2024
@serhiy-storchaka
Copy link
Member Author

@vstinner, could you please review this PR?

newformat, timetuple, NULL);
Py_DECREF(newformat);
}
if (0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if (0) is weird. Why not adding a label after _PyUnicodeWriter_Dealloc() and using goto?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require changing more code. Renaming label Done to Error (there are many goto Done in not changed code) and adding new Done label. The if (0) idiom is weird, but used in many places.

}
newformat = _PyUnicodeWriter_Finish(&writer);
}
if (newformat != NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to exit early: if (newformat == NULL) goto ....

By the way, only the _PyUnicodeWriter_Finish() code path can fail, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require adding more labels. The flow would look more complicated.

* will be ahead of time...
*/
while (1) {
outbuf = (time_char *)PyMem_Realloc(outbuf, bufsize*sizeof(time_char));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also add an explicit integer overflow check here.

Py_DECREF(unicode);
}

j = i;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should declare the j variable here, and I suggest to rename it to start.

}
format[fmtlen++] = (char)c;
}
if (fmtlen) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is quite big. You might move the code handling format in a sub-function (the whole if block).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be inconvenient, because this code changes several outer variables.: outbuf and bufsize.

@serhiy-storchaka
Copy link
Member Author

I addressed all comments. Perhaps it is now more difficult to review.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I just left minor coding style suggestions that you can ignore.

Thanks for the exhaustive added tests, it's very helpful.

Comment on lines 825 to 826
*outbuf = (time_char *)PyMem_Realloc(*outbuf,
*bufsize*sizeof(time_char));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*outbuf = (time_char *)PyMem_Realloc(*outbuf,
*bufsize*sizeof(time_char));
*outbuf = (time_char *)PyMem_Realloc(*outbuf,
*bufsize*sizeof(time_char));


static PyObject *
time_strftime(PyObject *module, PyObject *args)
strftime1(time_char **outbuf, size_t *bufsize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: you might rename the function to time_strftime1() to avoid any conflict with a potential strftime1() name from the C library.

@serhiy-storchaka
Copy link
Member Author

Thank you for your review @vstinner.

@serhiy-storchaka serhiy-storchaka enabled auto-merge (squash) October 17, 2024 15:19
@serhiy-storchaka serhiy-storchaka merged commit ad3eac1 into python:main Oct 17, 2024
37 checks passed
@miss-islington-app
Copy link

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @serhiy-storchaka, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker ad3eac1963a5f195ef9b2c1dbb5e44fa3cce4c72 3.13

@miss-islington-app
Copy link

Sorry, @serhiy-storchaka, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker ad3eac1963a5f195ef9b2c1dbb5e44fa3cce4c72 3.12

@serhiy-storchaka serhiy-storchaka deleted the strftime branch October 17, 2024 16:18
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request Oct 17, 2024
…5193)

Fix time.strftime(), the strftime() method and formatting of the
datetime classes datetime, date and time.

* Characters not encodable in the current locale are now acceptable in
  the format string.
* Surrogate pairs and sequence of surrogatescape-encoded bytes are no
  longer recombinated.
* Embedded null character no longer terminates the format string.

This fixes also pythongh-78662 and pythongh-124531.
(cherry picked from commit ad3eac1)

Co-authored-by: Serhiy Storchaka <[email protected]>
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request Oct 17, 2024
Fix time.strftime(), the strftime() method and formatting of the
datetime classes datetime, date and time.

* Characters not encodable in the current locale are now acceptable in
  the format string.
* Surrogate pairs and sequence of surrogatescape-encoded bytes are no
  longer recombinated.
* Embedded null character no longer terminates the format string.

This fixes also pythongh-78662 and pythongh-124531.

(cherry picked from commit ad3eac1)
@bedevere-app
Copy link

bedevere-app bot commented Oct 17, 2024

GH-125657 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Oct 17, 2024
@serhiy-storchaka serhiy-storchaka removed the needs backport to 3.12 only security fixes label Oct 17, 2024
@serhiy-storchaka serhiy-storchaka removed their assignment Oct 17, 2024
serhiy-storchaka added a commit that referenced this pull request Oct 17, 2024
…5657)

Fix time.strftime(), the strftime() method and formatting of the
datetime classes datetime, date and time.

* Characters not encodable in the current locale are now acceptable in
  the format string.
* Surrogate pairs and sequence of surrogatescape-encoded bytes are no
  longer recombinated.
* Embedded null character no longer terminates the format string.

This fixes also gh-78662 and gh-124531.

(cherry picked from commit ad3eac1)
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 17, 2024
…5193) (pythonGH-125657)

Fix time.strftime(), the strftime() method and formatting of the
datetime classes datetime, date and time.

* Characters not encodable in the current locale are now acceptable in
  the format string.
* Surrogate pairs and sequence of surrogatescape-encoded bytes are no
  longer recombinated.
* Embedded null character no longer terminates the format string.

This fixes also pythongh-78662 and pythongh-124531.

(cherry picked from commit 08ccbb9)

Co-authored-by: Serhiy Storchaka <[email protected]>
(cherry picked from commit ad3eac1)
serhiy-storchaka added a commit that referenced this pull request Oct 17, 2024
…5657) (GH-125661)

Fix time.strftime(), the strftime() method and formatting of the
datetime classes datetime, date and time.

* Characters not encodable in the current locale are now acceptable in
  the format string.
* Surrogate pairs and sequence of surrogatescape-encoded bytes are no
  longer recombinated.
* Embedded null character no longer terminates the format string.

This fixes also gh-78662 and gh-124531.

(cherry picked from commit 08ccbb9)
(cherry picked from commit ad3eac1)

Co-authored-by: Serhiy Storchaka <[email protected]>
@vstinner
Copy link
Member

Congrats! That's a nice change. Using Unicode instead of bytes for the internal buffer is a good move.

ebonnal pushed a commit to ebonnal/cpython that referenced this pull request Jan 12, 2025
Fix time.strftime(), the strftime() method and formatting of the
datetime classes datetime, date and time.

* Characters not encodable in the current locale are now acceptable in
  the format string.
* Surrogate pairs and sequence of surrogatescape-encoded bytes are no
  longer recombinated.
* Embedded null character no longer terminates the format string.

This fixes also pythongh-78662 and pythongh-124531.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

datetime.strftime strings can be terminated by "\x00" literals time.strftime() and Unicode characters on Windows

2 participants